Learning from imbalanced data sets with a Min-Max modular support vector machine
نویسندگان
چکیده
Imbalanced data sets have significantly unequal distributions between classes. This between-class imbalance causes conventional classification methods to favor majority classes, resulting in very low or even no detection of minority classes. A Min-Max modular support vector machine (M-SVM) approaches this problem by decomposing the training input sets of the majority classes into subsets of similar size and pairing them into balanced two-class classification subproblems. This approach has the merits of using general classifiers, incorporating prior knowledge into task decomposition and parallel learning. Experiments on two real-world pattern classification problems, international patent classification and protein subcellar localization, demonstrate the effectiveness of the proposed approach.
منابع مشابه
On Mining Fuzzy Classification Rules for Imbalanced Data
Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...
متن کاملOn Mining Fuzzy Classification Rules for Imbalanced Data
Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...
متن کاملA Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines
The class imbalance problem in classification has been recognized as a significant research problem in recent years and a number of methods have been introduced to improve classification results. Rebalancing class distributions (such as over-sampling or under-sampling of learning datasets) has been popular due to its ease of implementation and relatively good performance. For the Support Vector...
متن کاملA New Formulation for Cost-Sensitive Two Group Support Vector Machine with Multiple Error Rate
Support vector machine (SVM) is a popular classification technique which classifies data using a max-margin separator hyperplane. The normal vector and bias of the mentioned hyperplane is determined by solving a quadratic model implies that SVM training confronts by an optimization problem. Among of the extensions of SVM, cost-sensitive scheme refers to a model with multiple costs which conside...
متن کاملTask Decomposition Using Geometric Relation for Min-Max Modular SVMs
The min-max modular support vector machine (M-SVM) was proposed for dealing with large-scale pattern classification problems. M-SVM divides training data to several sub-sets, and combine them to a series of independent sub-problems, which can be learned in a parallel way. In this paper, we explore the use of the geometric relation among training data in task decomposition. The experimental resu...
متن کامل